Word Clustering for Data Sparsity: A Literature Survey

نویسنده

Kashyap Popat

چکیده

In this report, we present the literature survey done for our work with SA and other NLP applications. The road map of this report is as follows. In Section-1, we introduce clustering process and describe a few existing word clustering techniques. Section-2 talks about the smoothing process followed by why clustering is better for our work in Section-3. Finally in Section-4, we talk about the related work done for different NLP applications in which word clusters are used as helpful features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy Clustering Approach Using Data Fusion Theory and its Application To Automatic Isolated Word Recognition

In this paper, utilization of clustering algorithms for data fusion in decision level is proposed. The results of automatic isolated word recognition, which are derived from speech spectrograph and Linear Predictive Coding (LPC) analysis, are combined with each other by using fuzzy clustering algorithms, especially fuzzy k-means and fuzzy vector quantization. Experimental results show that the...

متن کامل

A NOVEL FUZZY-BASED SIMILARITY MEASURE FOR COLLABORATIVE FILTERING TO ALLEVIATE THE SPARSITY PROBLEM

Memory-based collaborative filtering is the most popular approach to build recommender systems. Despite its success in many applications, it still suffers from several major limitations, including data sparsity. Sparse data affect the quality of the user similarity measurement and consequently the quality of the recommender system. In this paper, we propose a novel user similarity measure based...

متن کامل

A Sharp Sufficient Condition for Sparsity Pattern Recovery

Sufficient number of linear and noisy measurements for exact and approximate sparsity pattern/support set recovery in the high dimensional setting is derived. Although this problem as been addressed in the recent literature, there is still considerable gaps between those results and the exact limits of the perfect support set recovery. To reduce this gap, in this paper, the sufficient con...

متن کامل

The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

Expensive feature engineering based on WordNet senses has been shown to be useful for document level sentiment classification. A plausible reason for such a performance improvement is the reduction in data sparsity. However, such a reduction could be achieved with a lesser effort through the means of syntagma based word clustering. In this paper, the problem of data sparsity in sentiment analys...

متن کامل

Two-way Poisson mixture models for simultaneous document classification and word clustering

An approach to simultaneous document classification and word clustering is developed using a two-way mixture model of Poisson distributions. Each document is represented by a vector with each dimension specifying the number of occurrences of a particular word in the document in question. As a collection of documents across several classes usually makes use of a large number of words, the docume...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Word Clustering for Data Sparsity: A Literature Survey

نویسنده

چکیده

منابع مشابه

Fuzzy Clustering Approach Using Data Fusion Theory and its Application To Automatic Isolated Word Recognition

A NOVEL FUZZY-BASED SIMILARITY MEASURE FOR COLLABORATIVE FILTERING TO ALLEVIATE THE SPARSITY PROBLEM

A Sharp Sufficient Condition for Sparsity Pattern Recovery

The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

Two-way Poisson mixture models for simultaneous document classification and word clustering

عنوان ژورنال:

اشتراک گذاری